The use of residuals for model checking has been well-developed in linear regression theory. The residuals are plotted versus some quantity, such as a covariate value, and the observed pattern is used to diagnose possible problems with the fitted model. Some residuals have the additional property of not only indicating problems but also suggesting remedies. That is the pattern of the plotted residuals may suggest an alternative model that fits the data better.
Many of these residuals have been generalized to survival analysis. In addition, the fact that survival data evolves over time, and requires special assumptions such as proportional hazards, makes it necessary to develop additional diagnostic residual methods.
(残差プロットでモデルの診断およびより適切なモデルの推測を行うことができる)
An important tool for assessing the goodness of fit of a model is to compare the censoring indicator (0 for censored, 1 for death) for each subject to the expected value of that indicator under the proportional hazards Cox model. If there are no time dependent covariates and if the survival times are right-censored, this is given by
\[ m_i = \delta_i - \hat{H_0}(t_i) \exp(z'_i \hat{\beta}) \]
These residuals, which originate from the counting process theory underlying the Cox proportional hazards model, sum to 0, range in value from \(-\infty\) to a maximum of 1, and each has an expected value of zero. The residual is essentially the difference between the observed value (1 or 0) of the censoring indicator and its expected value under a particular Cox model.
The asymmetry of the residuals might appear to be a disadvantage, but Therneau and Grambsch, in Chap. 5, show that a plot of these residuals versus the covariate \(z\) can reveal not only discrepancies in the model but also the actual functional form of this covariate. Martingale residuals may be used much as residuals that the reader may be familiar with from linear regression.
(Martingale残差は左右非対称でやや使いにくそうだが、残差分析によりモデルの妥当性および共変量の関数形の検証にも使うことができる!)
However, unlike those, the sum of squares of Martingale residuals cannot be used as a measure of goodness of fit. The “deviance” residual is an alternative that does have this property.
It may defined in terms of the martingale residual as follows:
\[ d_i = sign (m_i) \cdot \{ -2 \cdot [m_i + \delta_i \log(\delta_i - m_i)] \}^{1/2} \]
These residuals are symmetrically distributed with expected value 0 (if the fitted model is correct).
The sum of squares of these residuals is the value of the likelihood ratio test, which is analogous to the deviance from generalized linear model theory.
\(r_{m_i}\)の性質を残しつつ左右対称の分布へ。ただcensorの扱いが代わり、censor時はより強く負へ評価する。
While the properties of deviance residuals might seem preferable to those of martingale residuals, only the latter have the property of showing us the functional form of a covariate. For this reason, in practice, the martingale residuals are more useful.
Example 1:
Consider the pharmacoSmoking dataset, and a fit of the “null” Cox proportional hazards model. A null model is one with no fitted covariates. There is still a partial likelihood, and the model produces martingale residuals which take the form \(m_i = \delta_i - \hat{H_0}(t_i) \exp(z' \beta)\). We may plot these against continuous predictors to get a preliminary assessment of which of these predictors should be in the model, and what form they should take.
(null-modelのMartingale残差と共変量のplotを書くと線形性の評価や関数形の推測ができる!)
We first read in the data and truncate the variable “priorAttempts” at 20, since recorded values of this variable that exceed 20 are not likely to be correct,
prior_attempts_T <- pharmacoSmoking %>%
tbl_df() %>%
mutate(priorAttemptsT = ifelse(priorAttempts > 20, 20, priorAttempts))
We may fit the null model and obtain these residuals as follows:
# create null model
result_null_coxph <- coxph(Surv(ttr, relapse) ~ 1, prior_attempts_T)
# extract Martingale residuals of null model
resid_null_coxph <- residuals(result_null_coxph, type = "martingale")
We next assess the potential relationship of survival to age, prior attempts at quitting, and longest prior period without smoking. We plot these null model residuals versus each of these variables and also versus log transformations of these variables.
In the following, since the smallest value of “prior_attempts_T” and “longestNoSmoke” is zero, we add one to each before taking the log transformation.
We fit a “loess” smooth curve through each set of residuals to better assess the shapes of the relationships, using the loess function.
In order to also get 95 % confidence intervals for these smooth curves, we use the function smoothSEcurve, a simple plotting function that is defined in the appendix.
smoothSEcurve <- function(yy, xx) {
# use after a call to "plot"
# fit a lowess curve and 95% confidence interval curve
# make list of x values
xx.list <- min(xx) + ((0 : 100) / 100) * (max(xx) - min(xx))
# Then fit loess function through the points (xx, yy) # at the listed values
yy.xx <- predict(loess(yy ~ xx), se = T, newdata = data.frame(xx = xx.list))
# The loess curve
lines(yy.xx$fit ~ xx.list, lwd=2)
# lower line
lines(yy.xx$fit - qt(0.975, yy.xx$df) * yy.xx$se.fit ~ xx.list, lty = 2)
# upper line
lines(yy.xx$fit + qt(0.975, yy.xx$df) * yy.xx$se.fit ~ xx.list, lty = 2)
}
The results are in Fig.
# plot age vs Martingale resid
plot(resid_null_coxph ~ prior_attempts_T$age)
smoothSEcurve(resid_null_coxph, prior_attempts_T$age)
title("Martingale residuals\nversus age")
# plot log(age) vs Martingale resid
plot(resid_null_coxph ~ log(prior_attempts_T$age))
smoothSEcurve(resid_null_coxph, log(prior_attempts_T$age))
title("Martingale residuals\nversus log age")
# plot priorAttemptsT vs Martingale resid
plot(resid_null_coxph ~ prior_attempts_T$priorAttemptsT)
smoothSEcurve(resid_null_coxph, prior_attempts_T$priorAttemptsT)
title("Martingale residuals versus\nprior attempts")
In each case, the untransformed variables show considerable non-linearity. Note in particular the first plot, versus age, that shows the same non-linear relationship we modeled directly (with pspline) in Chapter 6. This illustrates that the use of martingale residuals with a null model is an alternative way to identify the form of a non-linear relationship.
(Martingale残差からこれらの共変量はCoxモデルにおいて線形性が成り立つとは考えにくい)
# plot log(priorAttemptsT + 1) vs Martingale resid
plot(resid_null_coxph ~ log(prior_attempts_T$priorAttemptsT + 1))
smoothSEcurve(resid_null_coxph, log(prior_attempts_T$priorAttemptsT + 1))
title("Martingale residuals versus\nlog prior attempts")
# plot longestNoSmoke vs Martingale resid
plot(resid_null_coxph ~ prior_attempts_T$longestNoSmoke)
smoothSEcurve(resid_null_coxph, prior_attempts_T$longestNoSmoke)
title("Martingale residuals versus\nlongest period without smoking")
# plot log(longestNoSmoke + 1) vs Martingale resid
plot(resid_null_coxph ~ log(prior_attempts_T$longestNoSmoke + 1))
smoothSEcurve(resid_null_coxph, log(prior_attempts_T$longestNoSmoke + 1))
title("Martingale residuals versus\nlog of longest period without smoking")
The log-transformation of “LongestNoSmoke” produces a curve that will be easier to adjust for using linear models, whereas with “age” and “priorAttempts”, the log-transformation doesn’t appear to help.
(対数変換すると“LongestNoSmoke”はだいぶ線形性になってきた、ただ“age”と“priorAttempts”はまだ線形性とは言いづらい)
As in the previous chapter, we use the step function to identify a model with low AIC:
# create model with low AIC by step
result_grp_coxph <- coxph(Surv(ttr, relapse) ~ grp, data = prior_attempts_T)
result_step <- step(result_grp_coxph, scope = list(upper = ~ grp + gender + race +
employment + yearsSmoking + levelSmoking +
age + priorAttemptsT + log(longestNoSmoke + 1),
lower = ~ grp))
## Start: AIC=766.32
## Surv(ttr, relapse) ~ grp
##
## Df AIC
## + age 1 762.48
## + log(longestNoSmoke + 1) 1 765.19
## + yearsSmoking 1 765.76
## <none> 766.32
## + employment 2 767.24
## + gender 1 767.43
## + priorAttemptsT 1 767.63
## + levelSmoking 1 768.27
## + race 3 770.72
##
## Step: AIC=762.48
## Surv(ttr, relapse) ~ grp + age
##
## Df AIC
## + employment 2 758.28
## + log(longestNoSmoke + 1) 1 761.99
## <none> 762.48
## + yearsSmoking 1 764.12
## + gender 1 764.20
## + priorAttemptsT 1 764.28
## + levelSmoking 1 764.48
## - age 1 766.32
## + race 3 766.52
##
## Step: AIC=758.28
## Surv(ttr, relapse) ~ grp + age + employment
##
## Df AIC
## <none> 758.28
## + log(longestNoSmoke + 1) 1 758.55
## + yearsSmoking 1 759.80
## + priorAttemptsT 1 759.87
## + gender 1 760.27
## + levelSmoking 1 760.28
## + race 3 761.10
## - employment 2 762.48
## - age 1 767.24
The resulting model is as follows:
result_step
## Call:
## coxph(formula = Surv(ttr, relapse) ~ grp + age + employment,
## data = prior_attempts_T)
##
## coef exp(coef) se(coef) z p
## grppatchOnly 0.60788 1.83654 0.21837 2.784 0.00537
## age -0.03529 0.96533 0.01075 -3.282 0.00103
## employmentother 0.70348 2.02077 0.26929 2.612 0.00899
## employmentpt 0.65369 1.92262 0.32732 1.997 0.04581
##
## Likelihood ratio test=22.03 on 4 df, p=0.0001979
## n= 125, number of events= 89
We may now assess the final model by creating residuals and then plotting them versus the final selected predictors,
# extract Martingale residuals from this low AIC model
resid_step_coxph <- residuals(result_step, type = "martingale")
# plot age vs resid
plot(resid_step_coxph ~ prior_attempts_T$age)
smoothSEcurve(resid_step_coxph, prior_attempts_T$age)
title("Martingale residuals\nversus age")
# plot grp vs resid
plot(resid_step_coxph ~ prior_attempts_T$grp)
title("Martingale residuals\nversus treatment group")
# plot employment vs resid
plot(resid_step_coxph ~ prior_attempts_T$employment)
title("Martingale residuals\nversus employment")
The results, shown in Fig, show that the residuals treatment group and employment are evenly distributed over the values of the covariates. The variable “age” still shows some possible deviation, but it is much improved over the plot for the null model.
(“grp”および“employment”とmodelのMartingale残差に傾向はない。“age”はやや傾向あるかも知らんがnull-modelに比べるとだいぶ改善あり。)
Some subjects may have an especially large influence on the parameter estimates. Since some such influential subjects may indicate a problem with the data, it is helpful for the data analyst to have tools that can identify those subjects.
Case deletion residuals (also called jackknife residuals) serve this purpose. For each subject, a case deletion residual is the difference in the value of the coefficient using all of the data and its value when that subject is deleted from the data set.
(全データで作ったモデル vs 1つデータを取り除いて作ったモデルの両者の係数を比較して、係数決定に大きな影響を及ぼしているような外れ値がないか確認する)
Using the pharmacoSmoking data set, we illustrate computation of these residuals using “age” as a predictor.
First, we find the coefficient for age (the fourth coefficient) using all the data:
# create cox model and extract coeff of age
result_coxph_all <- coxph(Surv(ttr, relapse) ~ grp + employment + age, data = pharmacoSmoking)
(coef_all <- result_coxph_all$coefficients[4])
## age
## -0.03528934
Next, for each subject in turn, we delete the ith subject from the survival time “ttr”, censoring indicator “relapse”, and covariates “grp”, “employment”, and “age”, and fit a Cox model to this reduced data set.
The results for the ith subject go into “lst_result_coxph_i”.
lst_result_coxph_i <- map(1:nrow(pharmacoSmoking), ~ coxph(Surv(ttr, relapse) ~ grp + employment + age,
data = slice(pharmacoSmoking, -.)))
We extract the age coefficient (the fourth element of the vector of coefficient estimates) into “coef_i” and compute the jackknife residual as the difference of “coef_i” and “coef_all”:
coef_i <- lst_result_coxph_i %>%
map("coefficients") %>% # extract coefficients
map_dbl(4) # extract age cofficients (4th element of coefficients)
# compute the jackknife residual
jackbeta_vec <- coef_all - coef_i
We may plot these residuals versus the patient id’s, “type=’h’” causes the residuals to be plotted as spikes. Finally, “abline(h=0)” plots a horizontal line through 0.
index_obs <- 1:nrow(pharmacoSmoking)
plot(jackbeta_vec ~ index_obs, type = "h",
xlab = "Observation", ylab = "Change in coefficient for age", cex.axis = 1.3, cex.lab = 1.3)
abline(h = 0)
The plot is shown here. There we see that no single patient changes the estimate of the “age” coefficient by more than 0.003, which is less than 10 % of the value of that coefficient. Still, we see that patients 46 and 68 have the most influence over the parameter estimate for age, and one may check these data points to ensure that there are no errors in recording the data.
df_jackbeta <- tibble(id = index_obs,
jackbeta = jackbeta_vec)
df_jackbeta %>%
arrange(jackbeta) %>%
tail(2)
## # A tibble: 2 x 2
## id jackbeta
## <int> <dbl>
## 1 68 0.00251
## 2 46 0.00296
A more convenient way to obtain case deletion residuals is using the residuals function with “type = ‘dfbeta’”.
These residuals are approximations to case-deletion residuals that require less numerical computation by eliminating all but one interaction in the partial likelihood maximization.
The simplest way to compute such as residual is exactly as outlined above for case-deletion residuals, but one starts the search for a maximum partial likelihood at \(\beta = 0\) and then only permit a single iteration of the coxph function for each subject.
Since the first iteration of the coxph function yields an estimate of the coefficient that is near the final value, these residuals should be nearly as effective at identifying influential subjects as the case deletion residuals.
These may be computed using the residuals function as follows:
resid_dfbeta <- residuals(result_coxph_all, type = "dfbeta")
df_dfbeta <- tibble(index = 1:nrow(pharmacoSmoking),
resid = resid_dfbeta[, 4])
plot(resid ~ index, data = df_dfbeta,
type = "h", xlab = "Observation", ylab = "Change in coefficient")
abline(h = 0)
The resulting dfbeta residuals plot is nearly identical to that before plot. While the reduction in computation time is of minimal value in most applications, using the residuals function to obtain dfbeta residuals has the advantage that it is slightly easier to use it to produce multiple plots for all of the coefficients—one only need select the corresponding element of “resid_dfbeta” for each coefficient in the model.
A modification of the dfbeta residual is to standardize it by an estimate of its standard error. In the residuals function, we obtain these residuals exactly as above, but using the option “type = ‘dfbetas’”. In this example this refinement makes no perceptible difference in which cases are found to be influential (see Exercise 1), but potentially could be of value with other data sets.
The proportional hazards assumption is key to the construction of the partial likelihood, since it is this property that allows one to cancel out the baseline hazard function from the partial likelihood factors.
If one has a binary predictor variable, such as experimental vs. standard treatment, what this assumption means is that the hazard functions are proportional, and hence that the log-hazards are separated by a constant at all time points.
Similarly, a categorical variable with many levels will result in parallel log hazard functions.
This assumption is at best an approximation in practice, and minor violations are unlikely to have major effects on inferences on model parameters.
For this reason, formal hypothesis tests of the proportional hazards assumption are often of limited value. Still, it is useful to assess, in a particular data set, if this assumption is reasonable, and what one can do if it is not.
Here we will examine some commonly used assessment methods.
If we are comparing survival times between two groups, there is a simple plot that can help us assess the proportional hazards assumption. Under this assumption, we have
\[ S_1(t) = [S_0(t)]^{\exp(\beta)} \]
where \(\exp(\beta)\) is the proportional hazards constant. Taking logs of both sides, we have
\[ \log[S_1(t)] = \exp(\beta) \cdot \log[S_0(t)] \]
Since the survival functions are less than 1, their logarithms are negative. Thus, we must negate them before we take a second logarithm,
\[ \log\{-\log[S_1(t)]\} = \beta + \log\{-\log[S_0(t)]\} \]
The function \(g(u) = \log\{-\log(u)\}\) is called a complementary log-log transformation, and has the effect of changing the range from \((0, 1)\) for \(u\) to \((-\infty, \infty)\) for \(g(u)\).
A plot of \(g[S_1(t)]\) and \(g[S_0(t)]\) versus \(t\) or \(\log(t)\) will yield two parallel curves separated by \(\beta\) if the proportional hazards assumption is correct.
We may illustrate this with pancreatic2.
There we found that the Prentice-modification test showed a stronger group difference than did the log-rank test, and the suggestion was that this difference was caused by non-proportional hazards. We can investigate the proportional hazards assumption as follows, with results shown in Fig.
# change data.frame to tibble
df_panc <- pancreatic2 %>%
tbl_df()
# divide data frame by stage and fit k_m model
lst_result_km_panc <- df_panc %>%
split(.$stage) %>% # divide data by stage
map(~ survfit(Surv(pfs) ~ 1, data = .)) %>% # fit k-m model
map(tidy) %>% # made df of k-m model
map(~ mutate(., log_time = log(time), # mutate log time
loglog_surv = log(-log(estimate)))) # mutate log-log survival
# plot log-log survival
# loglog_surv vs log_time at stage == "LA"
plot(lst_result_km_panc[["LA"]]$loglog_surv ~ lst_result_km_panc[["LA"]]$log_time,
type = "s", col = "blue", lwd = 2, xlab = "Log time", ylab = "Complementary log-log survival")
# loglog-surv vs log-time at stage == "M"
lines(lst_result_km_panc[["M"]]$loglog_surv ~ lst_result_km_panc[["M"]]$log_time,
type = "s", col = "red", lwd = 2)
legend("bottomright", legend = c("Locally advanced", "Metastatic"),
col = c("blue","red"), lwd = 2)
The curves are clearly not parallel.
However, one problem with this approach is that we don’t have a clear way to assess statistical significance (significant parallel or not). This is a critical issue here due to the small sample size, particularly in the locally advanced group.
Schoenfeld residual plots provide a useful way to assess this assumption. To see how they are derived, recall the partial log-likelihood function,
\[ l(\beta) = \displaystyle \sum_{i \in D} \lbrace \log(\psi_i) - \log(\sum_{k \in D} \psi_k)\rbrace = \sum_{i \in D} \{ z_i\beta - \log(\sum_{k \in D} e^{z_k\beta})\} \]
and its derivative, which is the score function.
\[ l'(\beta) = \displaystyle \sum_{i \in D} \lbrace z_i - \sum_{k \in D} z_k \cdot p(\beta, z_k) \rbrace \] where
\[ p(\beta, z_k) = \frac{e^{z_k\beta}}{\displaystyle \sum_{j \in R_k}e^{z_j\beta}} \]
The Schoenfeld residuals are the individual terms of the score function, and each term is the observed value of the covariate for patient i minus the expected value \(E(Z_i) = \bar{z}(t_i)\), which is a weighted sum, with weights given by \(p_k(\beta)\), of the covariate values for subjects at risk at that time. Each weight may be viewed as the probability of selecting a particular person from the risk set at time \(t_i\).
For an estimate \(\hat{\beta}\), the residual for the ith failure time is
\[ \hat{r}_i = z_i - \displaystyle \sum_{j \in R_k} (z_k\cdot p(\hat{\beta}, z_k)) = z_i - \bar{z}(t_i) \]
A plot of theses residuals versus the covariate \(z_i\) will yield a pattern of points that are centered at zero, if the proportional hazards assumption is correct. Note that these residuals are defined only for the failure (and not the censoring) times. If there are multiple covariates, then one obtains a series of residuals for each covariate.
The “expected” value of the covariate at that time is the weighted sum (weighted by pi). To clarify how we compute these, let us return to our simple example from Chap 4 comparing two sets of three observations.
We first need \(\hat{\beta}\), the log hazard ratio estimate, which we may compute as follows:
# make sample data frame
df_sample <- tribble(
~time, ~delta, ~treatment,
6, 1, 0,
7, 0, 0,
10, 1, 1,
15, 1, 0,
19, 0, 1,
25, 1, 1
)
# fit cox model
coxph_sample <- coxph(Surv(time, delta) ~ treatment, data = df_sample)
# show coef
coxph_sample$coef
## treatment
## -1.326129
We see that \(\hat{\beta} = -1.326\).
To compute he Schoenfeld residuals, we first compute the weights as follows:
Next, we compute the expected values, and then the residuals:
At the first failure time (\(t_i = 6\)) there are three at risk in the control group and three in the treatment group. The expected value of the covariate is weighted sum of the three control covariate values, \(z_i = 0\), the weights being \(1/[3+3\exp(-1.326)]\), and the three treatment covariate values, \(z_i = 1\) with weights \(\exp(-1.326)/[3+3\exp(-1.326)]\).
This works out to \(0.2098\), and the residual is \(0-0.2098 = -0.2908\).
The remaining residuals are computed analogously. These residuals may be computed in R as follows:
(resid_unscaled <- residuals(coxph_sample, type = "schoenfeld"))
## 6 10 15 25
## -0.2098004 0.5566351 -0.3468347 0.0000000
Gramsch and Therneau proposed scaling each residual by an estimate of its variance. This scaled residual may be conveniently approximated as follows:
\[
r^{\ast}_i = r_i \cdot d \cdot var(\hat{\beta})
\] where \(d\) is he total number of deaths, and \(var(\hat{\beta})\) is the variance of the parameter estimate. (If there is more than one covariate, then this is a covariance matrix.)
We may compute these as follows:
# unscaled Schoenfeld-residual
resid_unscaled
## 6 10 15 25
## -0.2098004 0.5566351 -0.3468347 0.0000000
# scaled Schonfeld-residual
(resid_scaled <- resid_unscaled * as.numeric(coxph_sample$var) * sum(df_sample$delta))
## 6 10 15 25
## -1.313064 3.483776 -2.170712 0.000000
Therneau and Gramsch showed that, if the hazard ratio is a function of \(t, \beta(t)\), then the expected value of the scaled residuals is
\[ E(r^{\ast}_i) \approx \beta + \beta(t) \] so that an approximate estimate of \(\beta(t)\) may be obtained by adding the estimate \(\hat{\beta}\) from the Cox proportional hazards model (which assumes proportional hazards) to the standardized residuals, and plotting these versus time, or the log of time.
resid_expect <- resid_scaled + coxph_sample$coef
df_resid <- filter(df_sample, delta == 1) %>% mutate(resid = resid_expect)
ggplot(df_resid, aes(time, resid)) +
geom_point() +
geom_smooth() + # this data is too small!
expand_limits(y = c(-5, 5), x = 5.5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 5.905
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 9.095
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 227.86
## Warning in sqrt(sum.squares/one.delta): 計算結果が NaN になりました
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 5.905
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 9.095
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 227.86
## Warning in stats::qt(level/2 + 0.5, pred$df): 計算結果が NaN になりました
This may be done more conveniently using the cox.zph function, which yields the same residuals directly,
# determine Sch-residuals
resid_sch <- cox.zph(coxph_sample, transform = "km")
# $y component yields the sum of the scaled residuals and the parameter estimate from the Cox model.
resid_sch$y
## treatment
## 6 -2.639193
## 10 2.157647
## 15 -3.496841
## 25 -1.326129
# plot resid vs time
plot(resid_sch)
The benefits of using Schoenfeld residuals become especially apparent when we apply the method to the pancreatic data. We compute the scaled Schoenfeld residuals as follows:
# fit cox model by stage
coxph_panc <- coxph(Surv(pfs, status) ~ stage, data = df_panc)
# determine Sch-redid
resid_sch_panc <- cox.zph(coxph_panc, transform = "km")
The “transform” option specifies that the time axis is scaled to conform with Kaplan- Meier-transformed time.
We may plot the residuals, and a smooth curve through them using a technique called “loess”, as follows
# plot Sch-resid
plot(resid_sch_panc)
The shape of the smoothed (loess) curve is an estimate of the difference parameter as a function of time, which appears to be constant or slightly increasing initially, followed by a steady decline after about 2 months.
Also given is a 95% confidence band for the smooth curve. The hypothesis test for a constant \(\beta\), which is a test of the proportional hazards function, may be obtained by fitting a straight line to the residuals plotted versus, yielding a p-value of 0.0496:
resid_sch_panc
## rho chisq p
## stageM -0.328 3.86 0.0496
Alternatively, variations of this test may be obtained by plotting \(\beta(t)\) versus time or versus other transformations of time. The cox.zph function offers a “rank” option, where the time axis is ordered by the ranks of the times, and an “identity” option, where the time variable is untransformed.
The results using these transformations are as follows:
c("km", "rank", "identity") %>%
purrr::set_names() %>%
map(~ cox.zph(coxph_panc, transform = .))
## $km
## rho chisq p
## stageM -0.328 3.86 0.0496
##
## $rank
## rho chisq p
## stageM -0.33 3.89 0.0486
##
## $identity
## rho chisq p
## stageM -0.197 1.39 0.239
The rank transformation yields a similar p-value to what we found with the “km” transformation. The “identity” transform results in a much higher p-value.
This is most likely due to the fact that deaths occur more sparsely at larger times, and as a result residuals at these larger times hold a great deal of influence on the fitted regression line.
Thus, this transform is likely not a preferred one when failure times are not uniformly spaced over time.
A wide range of tests for proportional hazards that have been developed may be viewed as linear regressions of Schoenfeld residuals versus various transformations of time. See Therneau and Grambsch for further discussion of this point.
# fit cox model
(cox_final_pharmaco <- coxph(Surv(ttr, relapse) ~ grp + employment + age, data = pharmacoSmoking))
## Call:
## coxph(formula = Surv(ttr, relapse) ~ grp + employment + age,
## data = pharmacoSmoking)
##
## coef exp(coef) se(coef) z p
## grppatchOnly 0.60788 1.83654 0.21837 2.784 0.00537
## employmentother 0.70348 2.02077 0.26929 2.612 0.00899
## employmentpt 0.65369 1.92262 0.32732 1.997 0.04581
## age -0.03529 0.96533 0.01075 -3.282 0.00103
##
## Likelihood ratio test=22.03 on 4 df, p=0.0001979
## n= 125, number of events= 89
# name of coefs
coefs <- c("grppatchOnly", "employmentother", "employmentpt", "age")
# case deletion
## make function calculating case_deletion_coefs
deletion_coefs <- function(x) {
cox <- coxph(Surv(ttr, relapse) ~ grp + employment + age, data = pharmacoSmoking[-x, ])
return(coefficients(cox_final_pharmaco) - coefficients(cox))
}
## make function calculating "dfbeta" or "dfbetas" coeffs
beta_coefs <- function(type) {
df_coefs <- residuals(cox_final_pharmaco, type = type) %>%
as_data_frame()
names(df_coefs) <- coefs
df_coefs %<>%
mutate(id = 1:nrow(pharmacoSmoking)) %>%
gather(var, value = type, -id)
return(df_coefs)
}
# calculate coefs by case_deletion, dfbeta and dfbetas
df_coefs <- map(1:nrow(pharmacoSmoking), deletion_coefs) %>%
reduce(bind_rows) %>%
mutate(id = 1:nrow(pharmacoSmoking)) %>%
gather(var, deletion, -id)
df_dfbeta <- beta_coefs("dfbeta") %>%
rename(dfbeta = type)
## Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
df_dfbetas <- beta_coefs("dfbetas") %>%
rename(dfbetas = type)
# bind_rows by deletion, dfbeta and dfbetas
df_coefs_all <- list(df_coefs, df_dfbeta, df_dfbetas) %>%
reduce(inner_join, by = c("id", "var"))
# plot deletion_coefs vs id
ggplot(df_coefs_all) +
geom_segment(aes(x = id, xend = id, y = deletion, yend = 0)) +
geom_hline(yintercept = 0) +
facet_wrap(~ var, nrow = 2, scale = "free_y")
# plot dfbeta_coefs vs id
ggplot(df_coefs_all) +
geom_segment(aes(x = id, xend = id, y = dfbeta, yend = 0)) +
geom_hline(yintercept = 0) +
facet_wrap(~ var, nrow = 2, scale = "free_y")
# plot dfbetas_coefs vs id
ggplot(df_coefs_all) +
geom_segment(aes(x = id, xend = id, y = dfbetas, yend = 0)) +
geom_hline(yintercept = 0) +
facet_wrap(~ var, nrow = 2, scale = "free_y")
# plot deletion_coefs vs dfbeta_coef
ggplot(df_coefs_all, aes(deletion, dfbeta)) +
geom_point() +
facet_wrap(~ var, nrow = 2, scales = "free")
# plot deletion_coefs vs dfbetas_coefs
ggplot(df_coefs_all, aes(deletion, dfbetas)) +
geom_point() +
facet_wrap(~ var, nrow = 2, scales = "free")
# plot dfbeta_coefs vs dfbetas_coefs
ggplot(df_coefs_all, aes(dfbeta, dfbetas)) +
geom_point() +
facet_wrap(~ var, nrow = 2, scales = "free")
# fit most fitted model, OS vs CXCL17P
(cox_cxcl17 <- coxph(Surv(OS, Death) ~ CXCL17P, data = hepatoCellular))
## Call:
## coxph(formula = Surv(OS, Death) ~ CXCL17P, data = hepatoCellular)
##
## coef exp(coef) se(coef) z p
## CXCL17P 0.0024403 1.0024433 0.0006236 3.913 9.1e-05
##
## Likelihood ratio test=11.88 on 1 df, p=0.0005689
## n= 227, number of events= 97
# Martingale-residuals
martin_cxcl17 <- residuals(cox_cxcl17)
# plot Martingale-residual vs
plot(martin_cxcl17 ~ hepatoCellular$OS, xlab = "OS", ylab = "resid")
smoothSEcurve(martin_cxcl17, hepatoCellular$OS)
# plot case-deletion residuals
df_deletion <- data_frame(index = 1:nrow(hepatoCellular),
deletion_cxcl17 = residuals(cox_cxcl17, type = "dfbeta"))
## Warning: `data_frame()` is deprecated, use `tibble()`.
## This warning is displayed once per session.
plot(deletion_cxcl17 ~ index, data = df_deletion,
type = "h", xlab = "Observation", ylab = "Change in coefficient")
# identify outlying points
hepatoCellular %>%
left_join(df_deletion, by = c("Number" = "index")) %>%
arrange(desc(abs(deletion_cxcl17))) %>%
select(Number, deletion_cxcl17) %>%
head()
## Number deletion_cxcl17
## 1 102 -1.305867e-04
## 2 112 -1.013163e-04
## 3 89 -8.488820e-05
## 4 59 7.986602e-05
## 5 31 7.679248e-05
## 6 110 -7.349816e-05
# Schoenfeld residuals and check the proportional hazard assumption
(zph_cxcl17 <- cox.zph(cox_cxcl17))
## rho chisq p
## CXCL17P 0.0462 0.21 0.647
# p.value > 0.05 and proportional hazard assumption is not rejected
plot(zph_cxcl17)